Longitudinal single-cell transcriptomics reveals distinct patterns of recurrence in acute myeloid leukemia

Background Acute myeloid leukemia (AML) is a heterogeneous and aggressive blood cancer that results from diverse genetic aberrations in the hematopoietic stem or progenitor cells (HSPCs) leading to the expansion of blasts in the hematopoietic system. The heterogeneity and evolution of cancer blasts can render therapeutic interventions ineffective in a yet poorly understood patient-specific manner. In this study, we investigated the clonal heterogeneity of diagnosis (Dx) and relapse (Re) pairs at genetic and transcriptional levels, and unveiled the underlying pathways and genes contributing to recurrence. Methods Whole-exome sequencing was used to detect somatic mutations and large copy number variations (CNVs). Single cell RNA-seq was performed to investigate the clonal heterogeneity between Dx-Re pairs and amongst patients. Results scRNA-seq analysis revealed extensive expression differences between patients and Dx-Re pairs, even for those with the same -presumed- initiating events. Transcriptional differences between and within patients are associated with clonal composition and evolution, with the most striking differences in patients that gained large-scale copy number variations at relapse. These differences appear to have significant molecular implications, exemplified by a DNMT3A/FLT3-ITD patient where the leukemia switched from an AP-1 regulated clone at Dx to a mTOR signaling driven clone at Re. The two distinct AML1-ETO pairs share genes related to hematopoietic stem cell maintenance and cell migration suggesting that the Re leukemic stem cell-like (LSC-like) cells evolved from the Dx cells. Conclusions In summary, the single cell RNA data underpinned the tumor heterogeneity not only amongst patient blasts with similar initiating mutations but also between each Dx-Re pair. Our results suggest alternatively and currently unappreciated and unexplored mechanisms leading to therapeutic resistance and AML recurrence. Supplementary Information The online version contains supplementary material available at 10.1186/s12943-022-01635-4.


AML samples
In total, 6 paired Dx-Re bone marrow aspirates from adult AML patients, aged between 31 and 69 and diagnosed with AML1-ETO (n=2 low risk cases) or FLT3-ITD (n=1 intermediate and n=3 high risk) were analyzed from the AMLSG BiO Registry study (NTC 01252485). The phenotypic characterization (AML1-ETO and FLT3-ITD) was performed at the reference laboratory of the German and Austrian AML Study Group (AMLSG). All patients received intensive standard induction chemotherapy with cytarabine and an anthracycline (7+3 regimen) followed by high-dose cytarabine consolidation cycles. Patient characteristics are summarized in Supplementary Table S1. Informed consent for both treatment and biobanking of leukemia samples according to the Declaration of Helsinki was given by all patients. Approval was obtained from the ethical review board of the University of Ulm (Ethikkommission der Universität Ulm).

Cell preparation
Bone marrow aspirates from AML patients were processed using density gradient centrifugation to isolate mononuclear cells, viably frozen in medium containing 10% DMSO and shipped with dry ice. Cryopreserved AML samples were thawed at 37℃. After washing with cold PBS, cells were resuspended in 500μl PBS with 0.2% human serum and incubated for 10 minutes on ice. After centrifugation and removing the supernatant, cells were resuspended in 200μl cold PBS with 20μl PE-conjugated CD33 and 20μl APC-conjugated CD34 (eBioscience). Cells were stained for 30 minutes on ice, washed with cold PBS with 0.5% BSA and resuspended in 500μl PBS with 0.5% BSA and 1μl 7-AAD (eBioscience). After incubation for 10 minutes, CD33/CD34+ cells were sorted into 384-well plate (BioRad) containing well-specific primers (100nl, 0.75pmol/μl) and 5μl mineral oil (Sigma-Aldrich) on the BD FACS Aria cell sorter. After sorting plates were sealed and span down for 2 minutes at 2000g, snap frozen on dry ice and stored at -80℃ until use.

Single cell SORT-seq
We applied SORT-seq 1 , a method that integrates single cell FACS sorting (Fluorescence-Activated Cell Sorter) with the CEL-Seq2 protocol 2 to measure gene expression at single cell resolution. External RNA Controls Consortium (ERCC) transcripts were spiked-in to detect empty wells, low quality mRNA and failed reactions. The frozen plates were centrifuged at 1400 RPM for 2 minutes at 4℃ before processing. Processing of single cell plates included first strand synthesis and barcoding in the 384 well plate. mRNA from the single cells were pooled and batch amplified by in vitro transcription (IVT) (Invitrogen # AM1334). Sequence libraries were prepared with Phusion High-Fidelity Polymerase (NEB).

Bulk RNA-seq
Total RNA was extracted using Quick-RNA Microprep Kit (Zymo Research) according to the manufacturer protocol with DNaseI treatment. The RNA concentration was quantified with Qubit Fluorometer (Invitrogen). RNA libraries were prepared with KAPA RNA HyperPrep Kit with RiboErase (HMR) kit (Roche) following the manufacturer's recommendations. RNAseq libraries were paired-end sequenced on an Illumina Nextseq 500 at an average depth of ~30M reads.

Detection of FLT3-ITD at diagnosis and relapse
The presence of FLT3-ITD was detected by DNA-based PCR followed by capillary electrophoresis. The detailed procedures were described previously 3 .

Sequencing and mapping
Single cell libraries were pair-end sequenced on an Illumina NextSeq500 at an average depth of ~30M reads per library and demultiplexed using bcl2fastq version 2.15.0.4 with default settings. We used STAR version 2.7.2b 4 to map the 42nt long read1 to human reference genome hg38. Next, we used UMI-tools 5 to reconstruct the gene by cell UMI count matrix from the BAM file.

Normalization, dimensionality reduction and cluster analysis
We used the Seurat v3 6 R-package for downstream analysis. First, low quality cells (genes detected < 500 or UMI count > 12,000 or mitochondrial UMIs > 30% or ERCC reads > 20%) were discarded (Supplementary Figure 2A). Ribosomal and mitochondrial genes were also discarded prior to normalization. Cells from all libraries were concatenated and log2 normalized. Next, we applied principal component analysis (PCA) on the 2,000 most variable genes to reduce the dimensionality of the dataset and retained the 50 components for cluster analysis and the identification of marker genes. Cluster analysis was run using the Louvain algorithm. Cluster markers were identified using the Seurat function FindAllMarkers with parameters min.pct=0.25, logfc.threshold=0.5 and only.pos=FALSE. Marker genes discriminating two clusters or Dx from Re cells were obtained usng the Seurat function FindMarkers with parameters min.pct=0.25, logfc.threshold=0.5, min.diff.pct=0.2 and only.pos=FALSE. All marker genes with adjusted p-value > 0.01 were discarded.

Whole exome sequencing analysis
We used the GATK toolkit version v4.2.0 7 to detect short somatic variants following the GATK best practices workflows "Data pre-processing for variant discovery" and "Somatic short variant discovery (SNVs + Indels)" with small modifications. Briefly, paired-end reads were aligned using BWA version 2.2.1 8 , discarding reads with MAPQ < 20. PCR duplicates were marked using Sambamba 0.8.0 markdup 9 and base quality scores were recalibrated using the GATK functions BaseRecalibrator and ApplyBQSR . Variants were called using the panel of normals (PON) and gnomAD VCF file provided in the GATK resource bundle in two modes: Dx and Re as tumor samples and Cr as a germline control, or all three samples as tumor only. The rationale for this approach is that variants present (at low frequency) in the Cr sample (due to minimal residual disease) are sometimes discarded as germline. Mutations were visualized using the maftools R-package 11 and listed in supplemental table 2.

Somatic copy number variation analysis
Somatic CNV were detected from Dx and Re whole exome libraries using GATK v4.2.0, following the GATK "Somatic copy number variant discovery" workflow. In brief, a CNV panel of normal (PON) was constructed from the complete remission (CR) samples and CNVs were called from the Dx and Re samples, using CR samples as patient-matched normal samples. Called CNV segments were used as input for inferred copy number analysis (iCNV) using single cell RNA-seq data (described below).

Inferred copy number variation analysis
Copy number variation was inferred by tiling the genome into 3 Mb windows. Gene expression counts for genes overlapping each window where summed. Windows were log normalized followed by Z-scoring of cells (mean=0, sd=1) and smoothing using a running median with k=3 windows. Data was visualized using the ComplexHeatmap R-package 12 .
To detect CNVs from altered allele frequencies at single cell resolution, we first modified the BAM files and prefixed each cell barcode with a sample specific pseudo-barcode. Next, Dx and Re BAM files were merged and heterozygous positions were detected using the program BAFExtract 13 and converted to VCF. This VCF was used to count UMIs supporting the REF or ALT allele at single cell level using CellSNP-lite 14 . To discard possible sequencing artifacts, somatic mutations and low coverage SNPS, we computed the binomial confidence interval (α=0.05) from the total REF and ALT counts at Dx; the timepoint that did not show the considered copy number aberrations. SNPs with a total depth < 20 UMIs or a binomial confidence interval that did not contain the expected AF=0.5 were discarded. Because the SNP counts are low on a single cell level, we aggregated all SNP counts inside a given CNV segment for each cell to increase statistical confidence. To this end, we first determined at each SNP whether it supported the tumor allele as the REF or ALT using the aggregated UMI counts from Dx and Re. UMIs supporting the tumor allele (ALT or REF) were subsequently summed over all heterozygous positions per cell and divided by the total UMI depth (REF + ALT) to obtain a "tumor allele frequency" per cell.